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On relative weighted entropies with central moments 

weight functions 


0 


Salimeh Yasaei Sekeh^, Adriano Polpo * * 


Abstract 


Following jT, the aim of this paper is to analyze the relative weighted entropy involving 


the central moments weight functions. We compare the standard relative entropy with the 
weighted case in two particular forms of Gaussian distributions. As an application, the 
weighted deviance information criterion is proposed. 
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1 Introduction: The weighted entropies 

Let 1 be a real-valued random vector (RV) with a join probability density function (PDF) /. 
The differential entropy (DE) of RV X_ is defined by 



(1) 


The definition and a number of inequalities for a standard DE were illustrated in [9., |3J 7j. 
Furthermore, in mm the initial definition and results on weighted entropy was introduced. 
Following maisiEiiii], recently in mmmmm, a similar method with standard DE drives 
to emerge certain properties and applications of information-theoretical weighted entropies with 
a number of determinant-related inequalities. 
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Let x G M n !->■ <f(x) > 0 be a given (measurable) function, called weighted function (WF). 
The weighted differential entropy (WDE) HJ (X) of a real-valued RV X with a PDF / is given 
by 

= H JU) := -E^QQlog /(X) = - f cj>(x) f (x) log f(x) dx, (2) 

JR™ 

Note that the WDE Q is obtained for a given non-negative WF; when this function equals 1, 
the WDE coincides with the standard (Shannon) DE, (jT]). 

We also assume the integrals in (JT]) and ([2]) absolutely converge, on the other hand the WDE 
and DE are finite. A standard agreement 0 = 0. log 0 = 0. log oo is considered throughout the 
paper. 

We now give the definition of conditional DE and mutual DE for RVs, in view of the fact 
that these are the ones on which we shall focus in our analysis more. 


Definition 1.1 Let JC 1 G M mi and X_ 2 £ K” 12 be bwo RVs, with joint PDFs f (xy , x 2 ) and 
marginal PDFs fi(xi) and f' 2 (x 2 ). The conditional DE of X x given X_ 2 is defined by 

H(X i|X 2 ) = - [ /(Ai , A 2 ) log f fl , -^ ) dx l dx 2 . (3) 

Jl ra l+ m 2 / 2[X 2 ) 

Next for RV X = (Xi, X 2 ,..., X n ), we use joint and marginal PDFs fx 1 ,...,x n and fx x , fx 2 , ■ ■ ■ ■> fx n 
to define the mutual DE by 


Hfx u .. •Xn) fx 1 • • • fx n ) 


f(x) log 


f{x) 

fl(xi) • • • fn{Xn) 


dx, 


( 4 ) 


note that motivated by continuity, we set Olog § = 0. 


Here and below we use both notations f(x 1 ,... ,x n ) and fx 1 ,...,x n for joint PDF allowing us to 
be flexible in shortening throughout the paper. In addition we employ both fi(xi) and fx, as 
marginal PDF of random variable Xj, i = 1,..., n. 


The following theorem was proven in (3]. 

Theorem 1.2 (Chain rule for the DE) Let X ±,..., X n be drown according to joint density PDF 
fix 1 , • • • ,x n ), then 

n 

H(X u X 2 , ...,X n ) = Y J H{Xi\Xi-i, ... , Xi). (5) 

2—1 

One of the mutual information’s properties, the same as bivariate case, is which can be 
demonstrated also in terms of marginal entropy and conditional entropy. The proof comes 
directly if we rewrite (jH) in Definition 11.11 and omitted. 
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Proposition 1.3 For RV X_ E M n with joint and marginal PDFs f Xl ,...,x n and fx\ ■, fx 2 ■ ■ ■ ■> fx n , 
we have 

n —1 

I(fx 1 ,..., Xn J Xl ■ • • 7*J = £ i H ( X i) - H ( X i\ X i+l, ■ ■ ■ , X nj] • (6) 

i= 1 


Remark: An alternative expression for mutual information, in terms of entropy and conditional 
entropy is derived as follows: 

n— 1 

I(fx 1 ,...,x n Jx 1 ■ ■ ■ fxj = £ £x i+1 ,...,* n mXi) - H(Xi\x i+l ,x n )} . (7) 

1=1 


Here 


H(Xi\x i+ i, ...,x n ) 


f(xi\x i+ i, ...,x n ) log f(xi\x i+1 ,..., x n )dxi. 


( 8 ) 


As the Definition 1.2 in [13] : Let x E M n i->- <f>{x) > 0 be a WF. The conditional WE of RV 
A-i E M mi given X 2 E M” 12 is defined by 


K(XiIK 2 ) = - 


/ 

jr 




</>(^l,^2)/fel^ 2 )log 


7(2l,2 2 ) 

/2(®2) 


dxidx 2 , 


(9) 


and the mutual WE is given by 


r (fx 1 ,...,x n Jx 1 ■ ■ ■ fx n ) = [ 0(x)f(x) log 

JR™ 


7(2) 


7l(^l) ■ ■ ■ fn(x n ) 


dx. 


( 10 ) 


This concept is easily adapted to the weighted DE by using the quality of random variables, as 
explained in [1]. 


2 Relative weighted entropies 

The contribution of our paper in this setting is thus twofold: 

1. We briefly improve several theorems discovered in m and give alternative definitions in 
particular form of WF. 

2. We reformulate these results for Gaussian distribution with two different covariance ma¬ 
trixes. 
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2.1 Central moments weight functions 

As we said in this paper, basically this subsection, we deal with central moments WFs, of the 

n 

form (f)(x) = n ( x i ~ a i ) 2 for constants ai,..., a n . 

1=1 

The naturalness of the definition of the WE and conditional WE is exhibited by the fact 
that the WE of a vector of random variables is the conditional WE of one plus the generalized 
conditional WE of the others. On the other hand the chain rule can thus be adapted to the 
WE; accordingly, we reformulate it as follows: 

Theorem 2.1 (Chain rule for the WE) Consider the RV X = (Xi,..., X n ) with joint PDF 

n 

f(x i,..., x n ). Then for constants ai,..., a n and given WF (f>{x) = fl ( x i ~ <h) 2 

i =1 

n— 1 

H%(X U ..., X n ) = Hj{X n | A n _!, ...,X l) + >: H^.(Xi\Xi-i,..., X\). 

1=1 

Here for constants a 3 ,... ,a n 

i>i(x i, ...,Xi) = Y\{xj - aj ) 2 E( (X i+ i - a i+ i) 2 \{Xi,.. .,Xi) = {x h ... ,xi) J (11) 

l=i ^ ' 

Proof: If (Xi, X 2 ) is a random pair, then in this particular case we have, 

HJ(X !,X 2 ) = H${X 2 \X x ) + H^Xi), 

Note that here ^(xi) = (x\ — a \) 2 E((X 2 — a 2 ) 2 |Xi = x\). Now more generally assume triple 
random (X±, X 2 , X 3 ), similarly the WE is obtained by 

HJ (X!, X 2 , X 3 ) = HJ (X 3 |X 2 , Xi) + h ; 2 (X 2 |Xi) + (X!), 

The WFs if 1 and ^2 are given by using the form of ifi when i = 1,2. Applying the same 
methodology and expanding the RV to n random variables, n > 3 we detect the given form by 

(X x ,... ,X n ) = H;(X 2 ,... ,X n |Xi) + H^X,) 

= HJ(X 3 ,..., x n |x 2 , X x ) + h ; 2 (x 2 |Xi) + (x x ) 

= H;(X n |X n _!, ...,X 1 ) + Hfjj nl (X n _!|X n _ 2 ,... ,X x )+ 

... + Hfp n l (X n _ 2 |X n _ 3 ,... ,Xi) + fl^(X 2 |X!) + ^(Xi). 

This leads to the desired result. □ 

In this stage, an immediate question crossed our mind which states shall we extend the 
similar conclusions due to the weighted entropies? In fact, among all equivalent expression for 
the mutual WE, as already observed in mutual information, the most applicable is represented 
by the WE and the conditional WE. 
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Theorem 2.2 Let us now consider the weighted mutual information, IJ(fx 1 ,...,x n , fx i • • • fx n ), 
then it can be written as follows: 


n— 1 


Ij(fx 1 ,...,x n ,fx 1 ■■■fx n ) = J2 H % -n*(x 1 , • • • , X^lXn), 

3 = 1 

where ifj(xj) = (xj - aj) 2 E - ai) 2 \Xj = Xj . 

Proof: By recalling (1101) . we observe that 
I$(fx 1 ,...,x n ,fx 1 ■ ■ ■ fxj 

P n 

= / W^Xi-ai ) 2 f(xi,...,x n )logf(xi,..., Xn -i\x n )dx 

• y « n i=i 


« n n —1 

/ TT(xj-aj ) 2 /(xr,...,^) y'log/jCx^dx 

• /rb i=i J= i 


Consequently, 


If{fx 1 ,...,x n ,fx 1 • • • /xj = -iZ^Xr,. • ■ ,*„-i|*n) 

72—1 


( 12 ) 


Aft 1 A I v 

/ yZ( x j ~ a j) 2 / TT ( x i ^ a i) 2 f(xi...,x j -i,x j+ i,...,x n \x j )f j (x j )logf j (x j )da 

JR r-f JR "- 1 . , 

j=i t=im 

n —1 „ 

: , X n _!|X n ) - V / (sj - aj) 2 E 

3=1 U 


n (^i-a i ) 2 |X i 


fj (xj ) log fj (xj) dxj. 


Which is precisely the result that we are looking for. □ 


Considering real situation which there exist two dependent groups of components or on the 
other hand random vectors, in some experimental research we are looking for the discrimination 
between probability function while such vectors are independent and dependent, in fact using 
this methodology clarifies the effect of dependent random vectors. Indeed, it seems as much as 
the dependency between two groups of random data is stronger than the information among 
density functions should raise. This fact will be adopted specifically in Gaussian distribution in 
the next subsection throughout examples. 

Proposition 2.3 Suppose that X_ = (Ad,... ,X n ) and Y_ = (Yf,..., Y rn ) be RVs showing any 
real situation, with joint PDF f(x i ,... ,x n ,y±, ..., y m ) and marginal multivariate PDFs fi(x\, ..., x n ) 
and f 2 ( 1 / 1 ,..., ym) respectively. Then 

D(fx\yWfx ) = H% m (X) - H(X\y), (13) 

here (fx\ y (x) = f(x\y )/ fi(%) and H(X\y) is defined as flj). 
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In addition, let us here define the relative DE, known as the Kullback-Leibler divergence, for 
two given functions f and g, D(f\\g) by 

D(f\\g)=[ f (x) log ^y=rdx. (14) 

J r « g[x) 

Proof: The Proof is based on the equation (11411 and straightforward. □ 


Remak: We explicitly note that by taking expectation on D(f x \ y ,fx) with respect to RV 
Y, by virtue of (fT3l) . mutual DE can be yielded: 


fxfx) = Ey_ 


D(fx\y,fx) 


Ey 




(15) 


Here, going back to weighted information measure we implicitly present the weighted infor¬ 
mation between fx\y and fx- This probably makes reader even more interested, however we 
were also wondering whether the amount value of RVs associates the effects of dependency or 
not but let us to concentrate on this object in the next part of the paper. 

Definition 2.4 For two functions x £ M n i->- /(x) > 0 and x € M n i->- g(x) > 0, the relative 
WE (the weighted Kullback-Leibler divergence ), for given WF f> is defined by 

D ^(f\\g)=[ (t>{x)f{x)\ og^ 7 =|dx. (16) 

V J R" 9{x) 


Theorem 2.5 With the same assumptions and analogue method as Provosition 12.31 for given 

n 

wf fi(x) = n (xi — ai) 2 , constants ai,... ,a n , one can obtain the following relationship: 
i— 1 


r n f(x\v) 

D 6 ( fx\y, fx) = / ]T( Xi “ a i ) 2 f(x\y) l°g 77T dx 
Jm. n i=1 j uy 

« n 

= / TT( Xi ~ a *) 2 f&\y) lQ g f(x\y)dx 

jRn i=1 n 

- [ n (Xi - at) 2 fix) log f{x)dx 

Jr « fJi f{x) 

= H$ UQ-H%(X\y). 

^ 2 L\y 

where 


(17) 


n 

<p'x\y(x) = \\{Xi- a i ) 2 
2—1 


'f(x\y_Y 

. h(x) _ 


(18) 
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Similar equations to (1151) in terms of weighted case also can be seen, 


E 


Y 


m r n f(x\v) 

- a j) 2 f2(y) I TT(*<- a *) 2 fis\y) lo g i- 7 ~ dxdy 


n It 

h{y) / 

• /r " i=1 

r. m n 

/ T[(yj - a j) 2 T[( x i- a i ) 2 fix, y) log 

J R-+m , =1 


h(x] 
f{x,y) 
_h(x)f 2 (y)_ 


dxdy , 


hence at last piece of discussion in this subsection, we point out that the mutual WE can be 
implied by calculating mth order of moments for random vector Y while DY (fx\yi fx)-f(y) plays 
the rule of density function. 


( fx,Y, fxfy ) = E Y 


= Ey 


UK - atf fx) 


3 = 1 


II Xj-ajf Ih^uq - HZ(X\y) 

3 =1 


(19) 


The WF 4>' X \y applies the form as in (fT8l) . 


2.2 Gaussian distribution 

The Gaussian distribution is the most useful, and most studied, of the standard joint distribu¬ 
tions in probability. A huge body of statistical theory depends on the properties of families of 
random variables whose joint distribution is at least approximately multivariate normal. As we 
know many fancy statistical procedures implicitly require bivariate (or multivariate, for more 
than two random variables) normality. Moreover, the hypothesis of dependency between random 
variables has been always the center of researcher’s attentions, hence in this subsection we focus 
on the dependent RVs with Gaussian distribution. 


Throughout this part of our research we give two types of Gaussian examples with different 
covariance matrixes. By using the same technique as before, general formulas for n = 3 are 
given. Furthermore we will observe the rule of coefficient correlation p in the relative measure 
for the weighted and standard forms. 

Consider X_ ~ Aone of the achievements in [3] explicitly shows that the entropy for 
this famous family does not depend on p: 

HQ 0 = ^log[(27r)"|E|]. (20) 

where I El is the determinant matrix E. 
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However, in the Gaussian case, by virtue of © involving WF (f>{x) = f~[ (x t — ^i) 2 , the WE 


admits a representation depending on mean //, (see the Appendix): 


2=1 


Here 


HJ(X) = 


3 3 


llog((2„) s |E|) = + 


i =1 j =1 


A ij — ( E11E22 


+ 2 (Ei 2 ) 2 ^ .+ 2 , 

E = Eli ^ 22^33 + 2 (S23) 2 ^ + 2 S 12 ^ 12 X 33 + 2 E 13 E 23 

+2 E13 r2E12X23 + ^13^22^ • 


( 21 ) 


( 22 ) 


Recall random pair X 1 = (Xi , X 2 ). having Gaussian distribution with mean // = 

E 11 E 12 


Mi 

M 2 


and covariance matrix Si = 


M = 


S 21 E 22 

Mi 

M 2 


Hence the RV (Xi, X 2 IX 3 = X 3 ) ~ jV(/i, E), where 
Mi + E 13 S 33 1 (X 3 — / 13 ) 


M 2 + E 23 S 33 1 (x 3 — fl 3 ) 


(23) 


E = 


r 



Eh 

E 12 


S 21 

E 22 



Ell — ElsEgg 1 E 3 I Sl 2 — E 13 E 33 1 E 32 


E 21 — E 23 E 33 1 E 31 


E 22 — E 23 S 33 L E 32 

Further, we represent the inclosed formula for D(f( Xl ,x 2 )\x 3 , f(x 1 ,x 2 ))- Then we shall exploit 
later to compare it with the weighted one in order to catch a part of our main purpose in this 
work. 


(24) 


Going back to the Proposition 12.31 admits the representation 

D(f {XuX 2 )lx 3 J {Xl , X2) ) = H; (XiX 2 )lx 3 (X 1 ,X 2 ) -H((X 1 ,X 2 )\x 3 ) 
= \ l°g ((2 vt) 2 |Ei|) - ilog ((2 tt) 2 |E|) 

+— "y ] Ejj {S ij + MiMj — Mi Mi — M*Mj T M*Mi} • 

1 i,j= 1,2 


(25) 


Note that here (j)( Xl x 2 )\ X3 (xi,x 2 ) = f{%i,x 2 \x 3 ) j f(x\,x 2 ). For simplification and avoiding con¬ 
fusion we introduce EE 1 as the cells in the concentration matrix ST 1 . 

L J 1 
















The conditional DE X 2 )\x%) follows directly from (1201) and 

(X 1 ,X 2 \X 3 = x 3 )~Af(jI,E). 

On the other hand, for H^ (x x [X \, X 2 ) we have the respective formula: 

HJ (X 1 ,X 2 ) 

< t >( X 1 , X 2 )\ x 3 ^ 

= log ^( 2 tt)|Ei|^ + i ^ E*J 1 J^ f{x 1 ,x 2 \x 3 )(x i - m){xj - n j )dx 1 dx 2 

= log ((27r)|S 1 |^) + ^ ]T {E [XiXj\X 3 ] - fijE [Xi\X 3 ] - mE [Xj\X 3 ] + W j 
Z i,j= 1,2 ^ > 

= log (( 2 tt)|£i|^ + - ^ij 1 {^ij+~PiPj - Vj~Pi-+ ViVj}- 
Z i,j= 1,2 

In addition, according to (1171) . we are entitled to give a comprehensive expression 
for Dy(f {XuX 2 )lx 3 ,f {XuX2) ). Define 

2 


e(® 3 ) = e 


[[(Xt-^Xs 


Li=l 


A ij = E 


]*** - Vk) 2 (X t - jLi){Xj - Jij)\X 3 


L k =1 


(26) 


Then we get 

HJ(X u X 2 \X 3 = x 3 ) = 


2 2 


^log ((2 tt) 2 |E|) 0(2:3) + 


(27) 


i=1 2=1 

We conhne in the Appendix the calculations related to the precise value of 0(x 3 ) and A ij. 


Now for WF 




i= 1 


f(x 1 ,x 2 \x 3 ) 

. f(xi,x 2 ) J’ 


we draw the reader’s attention to the following assertion: 


2 2 


H J [xi X2) ^ X ^X 2 ) = I log [(2 tt) 2 |E 1 |] 0(® 3 ) + 


i=l j =1 


Here 0(® 3 ) is defined as in (1261) and 


r 2 


Ti2 = E 


][[(A fc - ^fiXi - fJ.i){Xj - /+)|X 3 


L /c=l 


(28) 


(29) 
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As before the explicit expressions of ©( 2 : 3 ) and T ^ are given in Appendix. 

Following Corollarv l2.51 combine (127(1 and (128(1 . and obtain the following quite long expression 
for the mutual WE: 


D ^ > U{X 1 ,X 2 )\x 3 ^f{X 1 ,X 2 )) 

= ilog[|S 1 |/|E|] 0 (x 3 ) + 


)EE e A t «-)EEA 1a «- 


i= 1 j =1 


i=l j =1 


(30) 


Consequently, it is now clear, both Kullbak-Leibler and weighted Kulback-Leibler information 
measures for Gaussian conditional pair (Ai,A 2 )|A 3 = X 3 ) precisely depend on mean and co- 
variances between all random variables individually. Because of this fact, it is logical to wonder 
about the effect of correlations on them and whether this effect on kullback-Leibler information 
is completely analogous to the weighted one. 

One obviously can imagine if the dependency between (Ai, A 2 ) and A 3 is increasing then 
the density function of conditional vector (Ai, A 2 )|A 3 = X 3 ) becomes more far than join density 
function (Ai, A 2 ), on the other way we understand that knowing dependent random variable 
A 3 gives more information. 

Indeed to prove this claim we shall present more evidences, so that two particular examples are 
considering in the following. 


Example 2.6 In Gaussian case assume p = 0 and E = 


1 

P 


P 

1 

0 


0 

1 


then 


(Ar, A 2 |A 3 = X 3 ) ~ Af(JI, E), 


where 


P = 


P 2 x 3 

0 


and E = 


1 ~ P 4 P 
P 1 


Following Ei as before, i.e. the covariance matrix for pair (Ai, A 2 ), one yields |Ei| = 1 — p 2 , 
|E| = 1 — p 2 — p 4 and 


^ ^ij {^ij A PiPj PjPi pipj A pipj } ^ ' ^ij ">j A Pipj } 

i,j= 1,2 i,j= 1,2 

= Sj ' 1 {En + p?} + S^'Eia + E^E 21 + S^E^ = — ^ + ^ _ 1} 


1 -p 2 


Using the above expression in (1251) . we obtain 


D (f{ x 1 ,x 2 )\x 3 J{x 1 ,x 2 )) = xlog 


1-p 2 


L(l_p 2 _p 4 )j 2(1 -p2) 


A 


~-( x 3 — 1 ) A 1 . 


(31) 
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Taking look into this equation, we are not explicitly able to realize if it is a monotonic 
function with respect to p, but obviously it is an even function with respect to p and X 3 , that is 
the relative DE does not depend on the sign of correlation coefficient p. Although from (1311) it 
is clear that the absolute value of x% effects on the information. 



1 2- 
\ 

1 

i 


t 



-2 

' \ 3 

, 1.4 - 

) • 


/ / 

\ \ 1-2- 

/ 

\ A 

/.-■ / 


y 




-0.4 -0.2 0 0.2 0.4 

p values 


A: p values and a - 3 values 


4- / 



-4 -2 0 2 4 

.x. values 


B: p values 

Figure 2.2.1 


C: values 


In Figure 2.2.1 we see that first the relative DE takes non-negative values (Gibbs inequality, see 
mm)- Second the information raises when absolute value of p and x% is increasing which is com¬ 
pletely coincide with our expected claim, observe Figures 2.2.1(B) and 2.2.1(C). On the other 
hand in this example the dependency between X\ and A 3 , p 2 , and the discrimination between 
f(x i,x 2 )|x 3 ! f{Xx,x 2 ) change in the same direction. Although since we have concentrated on the 
information between distributions (Xi,X 2 )\Xs = X3) and (Xi, X 2 ), the correlation between X\ 
and X 2 is not in our attention. 


Now, we shall switch to the relative WE in order to discover if we can extend the similar 
impression for the weighted one. It is worthwhile nothing that since in weighted information 
apart of probabilities we also add the amount values of RVs X\. X 2 , thus probably our percep¬ 
tion changes. 
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Consider p and X as already given in Example 12.61 we get 

0(x 3 ) = 1 + 2 p 2 + p A (xl - 1), 
h-ij = a ij(p) + P^ x 3 ((1 — P^)^ij + 2XijSij 


(32) 


here 


otij(a) — (1 — p ) I Sjj + 2 X 21 X 2 ^ ) + 2p| pTiij + XijX2j + XyX 


J 2i 


+Xii (^2pYi2j + Xy) + Xy ^2pX2i + Xh ) • 


Set /?y(p) = (/+ - Pi){Pj - Pj), we have 


l + 2^ + p 4 (^-l) 


fjj — a+(p) + Pij(p)- 


+2 p X 3 (/+ — pi). I Xjj + 2pT,2j I + (p x 3 ). I Xjj + 2X2jX 


+ 2 p 2 X3(pj Xii + 2pX 2 ij 


" 2 j 


(33) 


By virtue of (f30T) . we obtain 


(/(Xi,X 2 )|x 3 > f(x lt x 2 )) 

„2 


= 


1 


1 + 2/5 + p (x 3 — 1) 


. 1 - P 2 - /0 4 

+ 2 ^ — ^ 2 ) ( 3 (! “ ^ 4 ) 2 + 3 (! - P 4 ) + 6 P 2 - 6 P 4 - 6 P 6x 3 + 9/x| - 6+> 8 ar§ + p 8 x\ 
1 


(34) 


6p 2 (l - p 4 ) + 6(1 - p 4 ) 2 + 4/? 4 (l - p A )xi - 12p 4 - 4p s (l - p 4 )* 2 


2 ( 1 -P 2 -P 4 ) 

The following theorem was presented in m- 

Theorem 2.7 (The weihghted Gibbs inequality) Given non-negative functions f, g and f, 
assume the bound 


J ( !>{x) [f(x) - g(x)] dx > 0. 


(35) 


Then 


DJ{f\\g) > 0, 


with equality iff g = f. 

The condition (I35|) is re-written as 


pi) 


1=1 


= E 


f(x i,x 2 |x 3 ) - f(x 1,X 2 ) 


dx 3 dx2 


L 1=1 


([(Xi-pi) 2 \X 3 — E fliXi-pi) 


L i=l 


> 0 . 
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Correspondingly in this example we obtain: 


e(x 3 ) - e 





i). 


This states that for X 3 £ (—1,1) the condition (1351) is violated and there is no guaranty that the 
relative WE takes non-negative values whereas DJ(f(x 1 ,x 2 )lx 3 i f(x lt x 2 )) — see Figure 2.2.2(A). 
Further, a similar pattern as the relative DE for the relative WE’s behavior is confirmed on 
Figure 2.2.2(B) and 2.2.2(C) with the same values of p and X3. 


t: 

>■ 

'tz 

•S 

QZ 





Our observations still are preliminary, and we think that further examples are needed here, to 
build a detailed picture. Hence let us now devote our efforts on another special case of Gaussian 
distribution which has been called from Example 3.4.1 page 39, m : 


Example 2.8 Let X_ = (Xi, X 2 , X 3 ) be distributed according to an J\f( 0, E) distribution, where 
Eu = 1 ,(i = 1,2,3) and £12 = 1 — 2 p, £13 = £23 = 1 — p, 0<p<^. For every fix 
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C = (Ci, C* 2 , Cs) T € R 3 . we can write, 

C t EC = (1 - p)(C 1 + C 2 + C 3) 2 + /o(Ci + C 2) 2 + pCl 

Since C t T,C > 0 holds, and the equality holds if and only if C\ = C 2 = C 3 = 0, so X is a 
positive define matrix. We have then, 

p( 2 -p) -pi 

V p( 2 -p)/ 

Owing to (1251) we first calculate, 

'y ] ^ij {^ij + PiPj — HjPi — piPj + PiPj} 

i,j= 1,2 

= X/ {^*i + AhMj} = 1 + P + (1 — p)*!- 

0 = 1,2 


(1 - p)x 3 
(1 -/9)S 3 



A: p values and values 




C: .v 3 values 


Figure 2.2.3 
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Therefore, for p £ (0, ^), we drive 
^(/(Xi,X 2 )|x 3 >/(*!,x 2 )) 

= | log [(2vr) 2 4p(l - p)] - i log [(27r) 2 4/9 2 (l - p)\ + ^ [l + p + (1 - p)xf\ (36) 

= j l 1 + P + (! “ p)^3 ~ log(p)] - !■ 

Next, Figure 2.2.3 shows a more interesting character of behavior. Function T ) (/(Xi,x 2 )|x 3 ) f(x 1 .x 2 )) 

takes non-negative values. However the relative DE decreases in p £ (0, —). In other word, this 
is another example showing our perception holds true: when the dependency within RVs X\ 
and X% raises the information increases as well. 


Furthermore, the following expression for Q(x 3 ) emerges: 


©(®3) = P 2 {2 - pf + V 4 + 2p(2 - p)( 1 - pfxl - 4p 2 (l - pfxl + (1 - pfxj. 


(37) 


Therefore owing to (1301) . one yields 


D p(f(X i,X 2 )|x 3 ’/(W,X 2 )) ~ 

4p(l - P) 


log 


+ 


( 2^) 2 

1 


•e(x 3 ) 


8 p(l - p) 


p 2 (2 - p ) 2 - p 4 J 

(Tn + T 22 ) + (2p — 1)(T 12 ) 


(38) 


~2(p 2 (2 -pf-p 4 ) ( P(2 “ P)(Au + A22) + 2 ^ (Al2) ) ' 

Here ©(^ 3 ) is as (|37D and we have written the open form of and T tJ in Appendix. 

Let us now check the statues of the condition (1351) : 

©(* 3 ) - (l + (1 - 2/>) 2 ) 

= p 2 (2 - pf + 4p 4 + 2p(2 - p)( 1 - p) 2 x 2 - 4p 2 (l - pfx 2 + (1 - pfxl - 1 - (1 - 2 pf. 


Analyzing (15UD . one can explore that the condition (H-TI) doesn’t hold true for all values of p and 
x 3 whereas as Figure 2.2.4 shows, the relative WE is non-negative (within the indicated range 
of (p,x 3 ). 

Finally, the plots given in Figure 2.2.4(B) and 2.2.4(C) give an impression that the behavior 
of Xi,x 2 )|x 3 > f(x 1; x 2 )) I s more complicated. Other words, in this example the information 

doesn’t behave monotonically with respect to p and x 3 . Consequently, in spite of standard case, 
in weighted form one does not yield that the dependency between X\ and X 3 effects directly on 
the information. 
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B: p values 

Figure 2.2.4 


C: .\' 3 values 


3 An application: Weighted deviance information criterion 


Concluding this paper in this section, we briefly demonstrate an application of the relative DE 
and WE by exploiting Beysian analysis in model selecting, cf. mm- 

Assume that f(y) and g(y) respectively represent the PDFs of the ’’true model” and the 
’’approximating model” on the same measurable space. For given WF (j>, the relative WE or 
weighted Kullback-Leibler divergence is given by: 


DJ ( f\\g) = % y>(y) log f(y)] - E 5 [</>(£) log g(y)]. (40) 


Note that such a quantity is not always non-negative. Namely the smaller the value of D^, the 
closer we consider the model g to be the true distribution. Hence in practice the first part of 
& is negligible in model comparison for given data y = (yi,..., y n ) with weighs </>(y). 

As n increases to infinity, the following expression, weighted log-likelihood (say): 


1 1 r 

-Lj(%):=-J>g g{ yi \0)*to) 

n ^ n z —' L 

i —1 
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tends to [<fr(y) log g(y\0)] by the law of large numbers. Here <j>(yi ) is the weight for y t and y 
is supposed to be an unknown but potentially observable quantity coming from the same distri¬ 
bution / and independent of y. 


Next in agreement with m, we propose the weighted deviance information criterion (WDIC): 

WDIC = DE%0,y) + 2pl, (41) 

as an adaptation of the Akaike information criterion for weighted case for Bayesian models. 
Consider the penalty of over-estimating p ^ by 

p5 = E % [D££(0,j/)] -DE%0,y) 

in order to estimate the ’’effective number of parameters”. Here 

n 

H£;;(0,y) = -2^1og 


(42) 


Z— 1 


As far the full model specification of Bayesian statistics contains a prior function n(0) in addition 
to the likelihood, and the inference can be derived from the posterior distribution n(0|y) oc 
L(8\y)Tl(9), therefore 0 could be either posterior mean or mode. In practice the advantage of 
WDIC with respect to DIC is observed when the data has the utility (weight) non equal to one. 


Remark: It would be interesting to investigate some simulation results as evidence of this 
fact by using Markov chain Monte Carlo (MCMC) method. This also is one of our intentions 
for future works. 


4 APPENDIX 


Proof of II 21 1) : 


According to © for the Gaussian PDF and given WF <f)(x) = Y\{xi — /q) 2 , one can write: 

Z =1 


HJ{X) = 


-log((2^) 3 |£|) 


E 


- HkY 


+- 


J Lfc=i 

3 3 3 

[ n ( Xk - vkfnx) y 

^ R3 k= 1 t= 1 7 = 1 


(xi - /ii)T, ij 1 (xj - Hj )da 


■log((2vr) 3 |S|) 


E 


3 3 


(43) 


+;EE s «‘ e 


i= 1 j =1 


k =1 

HiXk-ntfiXi-mXXj-K) 


Lfe=l 
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Set yi = Xi — Hi, then yi ~ 7V(0, Ef) and moreover for odd M = r\ + r 2 + ■ ■ ■ + r n 
ElYYi Li ff = 0. Now let us focus on the last expectation in ([43l) which takes the form: 

3 

t3 , v .. \2/ 


A ij :— E 




= E 


n Y k Y i Y j 


Lfc=l 


= E [Y?Y%] E [if] E [YiYj\ + E [if Y 2 2 ] E [Y 3 lf E [Y 3 Yf + E [Yf Y 2 2 ] E [Y 3 Yf E [Y 3 Y 4 ] 
+E [Y 2 Y 3 ] E [Y 2 2 Y 3 ] E [YiYj] + E [Yf Y 3 ] E [if if E [Y 3 Yf + E [if Yf E [if Yf E [Y 3 lf 
+E [If Y 3 ] E [Y 2 2 Y 3 ] E [YiYj] + E [Yf Y 3 ] E [if if E [Y 3 Yf + E [if Yf E [if Yf E [Y 3 lf 


fl.l 

i 2 fj 


& 


f- 

! 1 2± 3 


#hl 

& 


-(E 11 E 22 + 2(E 12 )f(E 3 3E y + 2E 3i Ef. 
Next, in (1431) we need to find one more expectation: 


:= E 


nLiPk _ 


= E 


nti n 2 1 = E [If] . E [If] E [If] + 2 (E [Y 2 Y 3 ]) : 


+2 E [YiY 2 ]. (e [YiY 2 ] E [Y 2 ] + 2 E [Y x Yf E [Y 2 Y 3 ] ) 


+2 E 



Eli (£ 22 £ 33 + 2 (S 23 ) 2 J + 2 Eia ^Ei 2 E 33 + 2Ei 3 E 23 
+2 Si 3 ^2 Ei 2 E 23 + Ei 3 E 22 ^ . 


By replacing the above expressions in (1131) we can obtain 

3 3 

HJUL) = ' ' 


3 3 

ilog((2vr) 3 |E|) = + 


i= 1 3 =1 


(44) 


(45) 


which is exactly what we are looking for. 


□ 


Proof of (1271) : 


Recall the conditional WE: 

H${X 1 ,X 2 \X 3 = x 3 ) 

= - J <j)(xi,X 2 )f(xi,X 2 \x 3 )logf(xi,X 2 \x 3 )dxi dx 2 


= -log ^TrflSf] E 


fliXi-ia) 2 \x 3 


_i= 1 


\ yi / n^-^) 2 ( 

i,7=l,2 J k =1 


Xi - Hi){xj - Hj)f(xi,x 2 \x 3 )dxi dx 2 . 
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Observe that 


E 


n(X.i-^) 2 \X 3 

2=1 

r 2 


= E 


JJ (pQ — Mi) 2 + 2(-X’i — Mi) (Mi — Mi) + (Mi — Mi) 2 ) I-F 3 


L 2=1 
■ 2—2 


= i?[FiF 2 ] +^[F 1 ] (m 2 - M2) 2 + £[f 2 ] (Mi - Ml)' 
2 2 

+4 -E[FiY 2 ] J^J(Mi — Mi) + JJ(Mi — Mi) 2 - 


2 = 1 


2=1 


where (Fi,F 2 ) ~ N(0, E), ji and E are as in (l23l) and (p4l) . Therefore 


— ^11^22 + 2(Si 2 ) 2 ^ + En (e23E 3 3^3 _ M3^ 


n(x,-Mi) 2 ix 3 

. 2=1 

+E 22 ^Ei3E33 1 (^3 — M3^ + 4 E 12 (s^Ora — M 3 ) 

+ f[ ^3^33 (fi - M3)^ : = ©(2:3)(say). 


Furthermore 


A a ■= E 


II (X k - fi k ) 2 (X 2 - mJ(Xj - M,) |^3 


k =1 

■ 2 tt 2 t; 


F^F^F,-] + - Mfc ) 2 E[F fc F,F,] 

fc=i 

2 2 

+n - ^) 2i? tw]+ 4 n fc* - ^ [ 4?; 1^2 FiFj 


^=1 


fe=i 


(46) 


(47) 


(48) 


Owing to (Fi, F 2 ) ~ N(0, E), we have the following list of assertions: 

E\Y 1 Y 2 YiYj] = Eh (saaE* + 2E2iS2j^ + 2 E 12 ^E^Sjj + EijE2j + EijE2i^ 
+En ( 2 E 12 E 2j + E 22 E 1 J ] + E]j f 2 Ei 2 E 2 i + E 2 2 Eij j, 


and for i,j,k = 1,2 

S[FiF 2 FiFj] = Ei 2 Eij + EijE2j + EijE2i, Fl[FjFjl &] = S^E j k + 2 EijEj^, 
E[F 2 F 2 ] =EiiE 22 + 2 (Ei 2 ) 2 ,^[F. t F J ] = E#, £[F 2 ] = Y kk and 
Mfc — Mfc Ej^Sgg (3:3 — Ms)- 

which leads directly to the result. □ 


(49) 
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Proof of (1281) : 


By virtue of the definition of WE with given WF 

2 


\x 1 ,x 2 )\x 3 




2=1 


f(X u X 2 \x 3 ) 
f(X U X 2 ) 


we can write: 


HZ (X 1 ,X 2 ) 

<P( A- 1 ,X 2 )|x 3 V 

f 2 


- m) 2 f(xi,x 2 \x 3 )\og f(xi,x 2 )dxidx 2 


2=1 


= -log [( 27 t) 2 |S 1 |] E 


IJiXi-m) 2 \X 3 


_ 2=1 




i,j= 1,2 L fc=l 

Here @( 0 : 3 ) has been calculated already in (|47l) . Moreover one yields 


T a ■= E 


= E 


n (X k - Hkf(Xi - - fij)\X 3 

L k=l 

r 2 

t^2 


i J k + k (Hk — Mfc) + (Hk ~ Vk) 2 )-(Yi + (Hi ~ Vi)) i + (m j ~ Mi)) 


L fc=l 


(50) 


Applying RV (F 1 ,y 2 ), d50|) becomes 

Tjj = E[F^F,] + E[F^] ( 71 , - Hi)(E L -_N)_ 

+2 (h 2 - H 2 )(H j - Mi) + 2 E[ Y l Y 2 Y j \ (h 2 - M 2 ) (Mi - Hi) 

+E[Y 2 l Y. i X J ](H 2 - ^ 2 ) 2 + E\Yl] (h 2 - H2)\Hi Z Hi){-p j ~ Mi) 

+2 E[Y l Y 2 2 Y i \ (JI, - Hi)CPj ~ Mi) + 2 E [ Y ^ Y l Y A (Mi ~ Mi)(F - Mi) 

2 

+4 E[yiE 2 y i y J ] (tr - mi)(m 2 - M 2 ) + 4 £?[yiy 2 ] JJ(7z fc - MfcX/y - mO(mj - Mi) ( 51 \ 

fc=i 

+2 E[yiEj] (Hj - Hj)(Ei -Mi)(m 2 -M 2 ) 2 + 2 A^i^iKMi - Hi)(Ei -Mi)(m 2 ~ M 2) 2 

+E[F^F ! F j ] (/Zi - Hi) 2 + E\vl] (hi - Mi) 2 (m* - M^KMi - Mi) 

+2 E[y 2 yj] (h 2 -H2)(Hi - Hi) 2 (pj - Vj) + 2 e \Y 2 Yj\(H 2 - H2 )(Hi -Mi) 2 (m* - Mi) 

_2 2 

yEfy^'i] n (Mfc - Mfc) 2 + n (Mfc - Mfc) 2 (F - Mi)(Mi “ Mi)- 

/c=l /c=l 

Follow the expectations from (1491) . Hence the final relation is concluded. □ 
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Aij and T^ in Example 12.81 


A n = 12p 5 (2 - p) + 3p 2 3 (2 - pf + 4x 2 p 2 (2 - p) 2 ( 1 - pf 

+ 2 p 4 ^l(l - pf + x\p{2 - p)( 1 - p ) 4 - 12 p 3 x §(2 - p)(l - pf, 

A 12 = A 2 i = —9p 4 (2 - pf - 6 p 6 - 6 x|(l - pfp 3 (2 - p) - p 2 x|(l - pf 
+ 8 p 4 * 2 (l - pf + 4p 2 x |(1 - p ) 2 (2 - p) 2 , 

A 22 = 3p 4 (2 - pf + 12 y 0 5 (2 - p) + 4p 2 x 2 (2 - p) 2 (l - p ) 2 + 2 p 4 x 2 (l - pf 
+xjp{2 - p)( 1 - p ) 4 - 12 xl /) 3 (2 - p)(l - p) 2 . 

Furthermore 

Tu = 12/9 5 (2 - p) + 3p 3 (2 - p ) 3 + 2(1 - p) 2 x|(p 2 (2 - pf + 2pf 
-24(1 - pfx 2 3 p 3 (2 - p) + 3(1 - p) 2 z§p 2 (2 - pf + (1 - pfxl 
+4(1 - pfx 2 (p 2 (2 - p ) 2 - 2p 3 (2 - p)) - 8 p 2 (l - pfx\ + 7p(2 - p)(l - pfxf 

T i 2 = T 2 i = —9p 4 (2 - p ) 2 - 6 p 6 + 9(1 - pfx 2 (p 2 (2 - p ) 2 + 2p 4 ) + (1 - p) 6 xl 
-18(1 - pfx 2 p 3 (2 - p) + 6 p (2 - p)(l - pfxj - 6 p 2 (l - p ) 4 4 


T 22 = 3p 4 (2 - pf + 12p 5 (2 - p) + 6(1 - p) 2 x|(p 2 (2 - p) 2 + 2p 4 ) 

-24(1 - pf x 2 p 3 (2 - p) + (1 - pfx\ + 7p(2 - p)(l - pf xj 
- 8 p 2 (l-p) 4 x| + 3p 2 (2-p) 2 (l-p) 2 x|. 
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